AITopics | fine-tuning epoch

Collaborating Authors

fine-tuning epoch

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

781397bc0630d47ab531ea850bddcf63-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-9-2026, 00:45:05 GMT

fine-tuning approach, fine-tuning epoch, sequence, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning

Slack, Dean L., Moubayed, Noura Al

arXiv.org Artificial IntelligenceOct-14-2025

Most defences target the pre-training stage, leaving memorisation during fine-tuning--especially for domain adaptation and instruction tuning--poorly understood. We fine-tune Pythia, Llama3, and Mistral models spanning 1.4B-70B parameters on common evaluation datasets and track verbatim memorisation throughout training. We find that memorisation increases dramatically in the first few epochs, often significantly before either validation perplexity or evaluation performance is op-timised. We use a simple but effective n-gram memorisation score which reliably precedes verbatim memorisation; using it as an early-stopping criterion mitigates memorisation with minimal performance loss. Further, we introduce an n-gram-aware loss regulariser and show that it reduces memorisation across all model families tested by up to 40% while minimising evaluation performance trade-offs when compared to an existing memorisation mitigation strategy. These results yield practical, scalable insights into memorisation dynamics during language model fine-tuning.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.11372

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

We thank all reviewers for their constructive and valuable feedback and are delighted to receive an overall positive

Neural Information Processing SystemsOct-3-2025, 07:21:54 GMT

Furthermore, we will integrate all changes according to your suggestions and questions. Figure 1: Evaluation on the DA VIS 2017 validation set. The connection between the inner and outer optimization is also illustrated in Algorithm 1 of the supplementary. The mitigate in line 7 refers to shortcomings and we will improve the understandability of the abstract. The fine-tuning epochs in Table 1 refer to a single update with one image. How to train your MAML.

fine-tuning approach, fine-tuning epoch, sequence, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Efficient Unlearning with Privacy Guarantees

Domingo-Ferrer, Josep, Jebreel, Najeeb, Sánchez, David

arXiv.org Artificial IntelligenceJul-8-2025

Privacy protection laws, such as the GDPR, grant individuals the right to request the forgetting of their personal data not only from databases but also from machine learning (ML) models trained on them. Machine unlearning has emerged as a practical means to facilitate model forgetting of data instances seen during training. Although some existing machine unlearning methods guarantee exact forgetting, they are typically costly in computational terms. On the other hand, more affordable methods do not offer forgetting guarantees and are applicable only to specific ML models. In this paper, we present \emph{efficient unlearning with privacy guarantees} (EUPG), a novel machine unlearning framework that offers formal privacy guarantees to individuals whose data are being unlearned. EUPG involves pre-training ML models on data protected using privacy models, and it enables {\em efficient unlearning with the privacy guarantees offered by the privacy models in use}. Through empirical evaluation on four heterogeneous data sets protected with $k$-anonymity and $ε$-differential privacy as privacy models, our approach demonstrates utility and forgetting effectiveness comparable to those of exact unlearning methods, while significantly reducing computational and storage costs. Our code is available at https://github.com/najeebjebreel/EUPG.

artificial intelligence, machine learning, privacy guarantee, (18 more...)

arXiv.org Artificial Intelligence

2507.04771

Country:

North America > United States (0.68)
Europe (0.68)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Add feedback

An Approach Towards Learning K-means-friendly Deep Latent Representation

Roy, Debapriya

arXiv.org Artificial IntelligenceNov-29-2024

Clustering is a long-standing problem area in data mining. The centroid-based classical approaches to clustering mainly face difficulty in the case of high dimensional inputs such as images. With the advent of deep neural networks, a common approach to this problem is to map the data to some latent space of comparatively lower dimensions and then do the clustering in that space. Network architectures adopted for this are generally autoencoders that reconstruct a given input in the output. To keep the input in some compact form, the encoder in AE's learns to extract useful features that get decoded at the reconstruction end. A well-known centroid-based clustering algorithm is K-means. In the context of deep feature learning, recent works have empirically shown the importance of learning the representations and the cluster centroids together. However, in this aspect of joint learning, recently a continuous variant of K-means has been proposed; where the softmax function is used in place of argmax to learn the clustering and network parameters jointly using stochastic gradient descent (SGD). However, unlike K-means, where the input space stays constant, here the learning of the centroid is done in parallel to the learning of the latent space for every batch of data. Such batch updates disagree with the concept of classical K-means, where the clustering space remains constant as it is the input space itself. To this end, we propose to alternatively learn a clustering-friendly data representation and K-means based cluster centers. Experiments on some benchmark datasets have shown improvements of our approach over the previous approaches.

artificial intelligence, epoch, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.19496

Country: Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Magnificent Minified Models

Harang, Rich, Sanders, Hillary

arXiv.org Artificial IntelligenceJun-16-2023

There are many ways to make a deep neural network smaller. In this paper, we focus on three categories of model size reduction: pruning, quantization, and training smaller models from scratch. Quantization means changing model parameters to lower-precision formats, like changing all 32-bit floating point parameters to 16-bit, which results in file size about half as large. Pruning deals with deleting parameters or groups of parameters (like entire neurons) from a trained model to make it smaller (often followed by a fine-tuning round of training, as done in our experiments). Parameter-level pruning (also called unstructured pruning) prunes individual parameters at a time, whereas neuron-level pruning (also called structured pruning) prunes all parameters associated with a given neuron at once. To simplify terminology across multiple methods we use the term'damage' to broadly refer to the undesired impact of removing a node or zeroing a weight on network performance. Different compression methods use different approaches to either estimate damage directly, or rank neurons or weights in order of increasing assumed damage according to some other metric that does not directly evaluate the impact on loss or performance.

artificial intelligence, machine learning, pruning, (17 more...)

arXiv.org Artificial Intelligence

2306.10177

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Add feedback

L2PF -- Learning to Prune Faster

Vemparala, Manoj-Rohit, Fasfous, Nael, Frickenstein, Alexander, Moraly, Mhd Ali, Jamal, Aquib, Frickenstein, Lukas, Unger, Christian, Nagaraja, Naveen-Shankar, Stechele, Walter

arXiv.org Artificial IntelligenceJan-7-2021

Various applications in the field of autonomous driving are based on convolutional neural networks (CNNs), especially for processing camera data. The optimization of such CNNs is a major challenge in continuous development. Newly learned features must be brought into vehicles as quickly as possible, and as such, it is not feasible to spend redundant GPU hours during compression. In this context, we present Learning to Prune Faster which details a multi-task, try-and-learn method, discretely learning redundant filters of the CNN and a continuous action of how long the layers have to be fine-tuned. This allows us to significantly speed up the convergence process of learning how to find an embedded-friendly filter-wise pruned CNN. For ResNet20, we have achieved a compression ratio of 3.84 x with minimal accuracy degradation. Compared to the state-of-the-art pruning method, we reduced the GPU hours by 1.71 x.

l2pf, pruning, retrain, (17 more...)

arXiv.org Artificial Intelligence

2101.02663

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback